STA4173 Lecture 6, Summer 2023
We previously discussed testing three or more means using ANOVA.
We also discussed that ANOVA is an extension of the two-sample t-test.
Recall that the t-test has two assumptions:
Equal variance between groups.
Normal distribution.
We will extend our knowledge of checking assumptions today.
We can represent ANOVA with the following model: y_{ij} = \mu + \tau_i + \varepsilon_{ij}
where:
We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}
Normality: histogram of residuals
Normality: quantile-quantile plot
Variance: scatterplot of the residuals against the predicted values
We will construct a matrix of graphs to assess assumptions.
If we decide that the variance is questionable, we can (and will) formally test it.
I always base my final decision about normality on the q-q plot.
R syntax to put together the matrix of graphs (credit: former graduate student, Reid Ginoza)
library(tidyverse)
strength <- c(15.4, 12.9, 17.2, 16.6, 19.3,
17.2, 14.3, 17.6, 21.6, 17.5,
5.5, 7.7, 12.2, 11.4, 16.4,
11.0, 12.4, 13.5, 8.9, 8.1)
system <- c(rep("Cojet",5), rep("Silistor",5), rep("Cimara",5), rep("Ceramic",5))
data <- tibble(system, strength)
m <- aov(strength ~ system, data = data)
summary(m) Df Sum Sq Mean Sq F value Pr(>F)
system 3 200.0 66.66 7.545 0.00229 **
Residuals 16 141.4 8.84
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can formally check the variance assumption with the Brown-Forsythe-Levine test.
The test statistic is calculated as follows, F_0 = \frac{\sum_{i=1}^k n_i (\bar{z}_i - \bar{z})^2/(k-1)}{\sum_{i=1}^k \sum_{j=1}^{n_j}(z_{ij}-\bar{z}_i)^2/(n-k) }, where
Brown-Forsythe-Levine Test for Homoskedasticity
Hypotheses
Test Statistic
p-Value
Rejection Region
leveneTest() function from the car package.Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Recall the Palmer Penguin data in R.
We determined that there is a difference in bill lengths between the three species,
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
We also discussed how to assess the assumptions:
Graphically using the almost_sas() function.
Confirming the variance assumption using the BFL.
If we break an assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.
If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.
The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.
Our new hypotheses are
Alternatively,
The test statistic is as follows: H = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1), where
H follows a \chi^2 distribution with k-1 degrees of freedom.
Hypotheses
Test Statistic
p-Value
Rejection Region
kruskal.test() function to perform the Kruskal-Wallis test.-A family doctor wants to determine if the distributions of HDL cholesterol in males for the age groups 20 to 29 years, 40 to 49 years, and 60 to 69 years old are different.
He obtains a simple random sample of 12 individuals from each age group and determines their HDL cholesterol.
Do the data indicate the distributions vary depending on age? Use the \alpha = 0.05 level of significance.
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
We can also perform posthoc testing in the Kruskal-Wallis setting.1em
The set up is just like Tukey’s – we can perform all pairwise comparisons and control for the Type I error rate. 1em
Instead of using |\bar{y}_i - \bar{y}_j|, we will use |\bar{R}_i - \bar{R}_j|, where \bar{R}_i is the average rank of group i.
The comparison we are making:
kruskalmc() function from the pgirmess package to perform the Kruskal-Wallis post-hoc test.Revisiting our example,
Note that if we were doing this “for real” we would not do the posthoc test since we did not see a difference between the three groups.
We are doing it here for demonstration purposes.
Multiple comparison test after Kruskal-Wallis
p.value: 0.05
Comparisons
obs.dif critical.dif difference
20 to 29-40 to 49 2.583333 10.2969 FALSE
20 to 29-60 to 69 4.291667 10.2969 FALSE
40 to 49-60 to 69 1.708333 10.2969 FALSE
Recall the Palmer Penguin data in R.
Let’s check the differences in bill lengths using the Kruskal-Wallis.
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Multiple comparison test after Kruskal-Wallis
p.value: 0.05
Comparisons
obs.dif critical.dif difference
Adelie-Chinstrap 184.14584 34.56759 TRUE
Adelie-Gentoo 157.73868 28.74910 TRUE
Chinstrap-Gentoo 26.40716 35.76841 FALSE
Adelies are different from both chinstraps and gentoos.
Chinstraps and gentoos are not different.